Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs
نویسندگان
چکیده
A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes.
منابع مشابه
Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs
A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes.
متن کاملTurning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction
A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination of Web documents and query logs. Automaticallyextracted labeled classes, consisting of a label (e.g., painkillers) and an associated set of instances (e.g., vicodin, oxycontin), are linked under existing conceptual hie...
متن کاملWhat You Seek Is What You Get: Extraction of Class Attributes from Query Logs
Within the larger area of automatic acquisition of knowledge from the Web, we introduce a method for extracting relevant attributes, or quantifiable properties, for various classes of objects. The method extracts attributes such as capital city and President for the class Country, or cost, manufacturer and side effects for the classDrug, without relying on any expensive language resources or co...
متن کاملOpen Entity Extraction from Web Search Query Logs
In this paper we propose a completely unsupervised method for open-domain entity extraction and clustering over query logs. The underlying hypothesis is that classes defined by mining search user activity may significantly differ from those typically considered over web documents, in that they better model the user space, i.e. users’ perception and interests. We show that our method outperforms...
متن کاملLow-Cost Supervision for Multiple-Source Attribute Extraction
Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we intro...
متن کامل